On the performance of bisecting K-means and PDDP
نویسندگان
چکیده
The problem this paper focuses on is the unsupervised clustering of a data-set. The dataset is given by the matrix [ ] N p N x x x M × R ∈ = ,..., , 2 1 , where each column of M, p i x R ∈ , is a single data-point. This is one of the more basic and common problems in fields like pattern analysis, data mining, document retrieval, image segmentation, decision making, etc. ([12, 13]). The specific problem we want to solve herein is the partition of M into two sub-matrices (or sub-clusters) L N p L M × R ∈ and R N p R M × R ∈ , N N N R L = + . This problem is known as bisecting divisive clustering. Note that by recursively using a divisive bisecting clustering procedure, the dataset can be partitioned into any given number of clusters. Interestingly enough, the clusters so-obtained are structured as a hierarchical binary tree (or a binary taxonomy). This is the reason why the bisecting divisive approach is very attractive in many applications (e.g. in document-retrieval/indexing problems – see e.g. [17] and references cited therein). Among the divisive clustering algorithms which have been proposed in the literature in the last two decades ([13]), in this paper we will focus on two techniques: • the bisecting K-means algorithm; • the Principal Direction Divisive Partitioning (PDDP) algorithm.
منابع مشابه
On the performance of bisecting K - means and PDDP * Sergio
problem is known as bisecting divisive clustering. Note that by recursively using a divisive bisecting clustering procedure, the dataset can be partitioned into any given number of clusters. Interestingly enough, the clusters so-obtained are structured as a hierarchical binary tree (or a binary taxonomy). This is the reason why the bisecting divisive approach is very attractive in many applicat...
متن کاملBisecting K-means and PDDP: A Comparative Analysis
This paper deals with the problem of clustering a data−set. In particular, the bisecting divisive partitioning approach is here considered. We focus on two algorithms: the celebrated K−means algorithm, and the recently proposed Principal Direction Divisive Partitioning (PDDP) algorithm. A comparison of the two algorithms is given, under the assumption that the data set is uniformly distributed ...
متن کاملA comparative analysis on the bisecting K-means and the PDDP clustering algorithms
This paper deals with the problem of clustering a data set. In particular, the bisecting divisive partitioning approach is here considered. We focus on two algorithms: the celebrated K-means algorithm, and the recently proposed Principal Direction Divisive Partitioning (PDDP) algorithm. A comparison of the two algorithms is given, under the assumption that the data set is uniformly distributed ...
متن کاملChoosing the cluster to split in bisecting divisive clustering algorithms
This paper deals with the problem of clustering a data-set. In particular, the bisecting divisive approach is here considered. This approach can be naturally divided into two sub-problems: the problem of choosing which cluster must be divided, and the problem of splitting the selected cluster. The focus here is on the first problem. The contribution of this work is to propose a new simple techn...
متن کاملPrincipal Direction Divisive Partitioning with kernels and k-means steering
Clustering is a fundamental task in data mining. We propose, implement and evaluate several schemes that combine partitioning and hierarchical algorithms, specifically k-means and Principal Direction Divisive Partitioning (PDDP). Using available theory regarding the solution of the clustering indicator vector problem, we use 2-means to induce partitionings around fixed or varying cut-points. 2-...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001